Goto

Collaborating Authors

 relative similarity


Rethinking Graph Contrastive Learning through Relative Similarity Preservation

arXiv.org Artificial Intelligence

Graph contrastive learning (GCL) has achieved remarkable success by following the computer vision paradigm of preserving absolute similarity between augmented views. However, this approach faces fundamental challenges in graphs due to their discrete, non-Euclidean nature -- view generation often breaks semantic validity and similarity verification becomes unreliable. Through analyzing 11 real-world graphs, we discover a universal pattern transcending the homophily-heterophily dichotomy: label consistency systematically diminishes as structural distance increases, manifesting as smooth decay in homophily graphs and oscillatory decay in heterophily graphs. We establish theoretical guarantees for this pattern through random walk theory, proving label distribution convergence and characterizing the mechanisms behind different decay behaviors. This discovery reveals that graphs naturally encode relative similarity patterns, where structurally closer nodes exhibit collectively stronger semantic relationships. Leveraging this insight, we propose RELGCL, a novel GCL framework with complementary pairwise and listwise implementations that preserve these inherent patterns through collective similarity objectives. Extensive experiments demonstrate that our method consistently outperforms 20 existing approaches across both homophily and heterophily graphs, validating the effectiveness of leveraging natural relative similarity over artificial absolute similarity.


Graph Multi-Similarity Learning for Molecular Property Prediction

arXiv.org Artificial Intelligence

Enhancing accurate molecular property prediction relies on effective and proficient representation learning. It is crucial to incorporate diverse molecular relationships characterized by multi-similarity (self-similarity and relative similarities) between molecules. However, current molecular representation learning methods fall short in exploring multi-similarity and often underestimate the complexity of relationships between molecules. Additionally, previous multi-similarity approaches require the specification of positive and negative pairs to attribute distinct predefined weights to different relative similarities, which can introduce potential bias. In this work, we introduce Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework, along with a novel approach to formulate a generalized multi-similarity metric without the need to define positive and negative pairs. In each of the chemical modality spaces (e.g.,molecular depiction image, fingerprint, NMR, and SMILES) under consideration, we first define a self-similarity metric (i.e., similarity between an anchor molecule and another molecule), and then transform it into a generalized multi-similarity metric for the anchor through a pair weighting function. GraphMSL validates the efficacy of the multi-similarity metric across MoleculeNet datasets. Furthermore, these metrics of all modalities are integrated into a multimodal multi-similarity metric, which showcases the potential to improve the performance. Moreover, the focus of the model can be redirected or customized by altering the fusion function. Last but not least, GraphMSL proves effective in drug discovery evaluations through post-hoc analyses of the learnt representations.


Du

AAAI Conferences

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios.


Low-dimensional Data Embedding via Robust Ranking

arXiv.org Machine Learning

We describe a new method called t-ETE for finding a low-dimensional embedding of a set of objects in Euclidean space. We formulate the embedding problem as a joint ranking problem over a set of triplets, where each triplet captures the relative similarities between three objects in the set. By exploiting recent advances in robust ranking, t-ETE produces high-quality embeddings even in the presence of a significant amount of noise and better preserves local scale than known methods, such as t-STE and t-SNE. In particular, our method produces significantly better results than t-SNE on signature datasets while also being faster to compute.


Topic Modeling with Document Relative Similarities

AAAI Conferences

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.